07. Deciding on Metrics - Discussion
Re: Selecting Invariant and Evaluation Metrics
There's one invariant metric that really stands out here, and that's the number
of cookies that hit the homepage. If we've done things correctly, each visitor
should have an equal chance of seeing each homepage, and that means that the
number of cookies assigned to each group should be about the same. Since
visitors come in without any additional information (e.g. account info) and
the change effected by the experimental manipulation comes in right at the
start, there aren't other invariant metrics we should worry about.
Selecting evaluation metrics is a trickier proposition. Count-based metrics at
other parts of the process seem like natural choices: the number of times the
software was downloaded and the number of licenses purchased are exactly what
we want to change with the new homepage. The issue is that even though we
expect the number of cookies assigned to each group to be about the same,
it's much more likely than not they they won't be exactly the same. Instead,
we should prefer using the download rate (# downloads / # cookies) and purchase
rate (# licenses / # cookies) relative to the number of cookies as evaluation
metrics. Using these ratios allows us to account for slight imbalances between
groups.
As for the other proposed metrics, the ratio between the number of licenses and
number of downloads is potentially interesting, but not as direct as the other
two ratios discussed above. It's possible that the manipulation increases both
the number of downloads and number of licenses, but increases the former to a
much higher rate. In this case, the licenses-to-downloads ratio might be worse
off for the new homepage compared to the old, even though the new homepage has
our desired effects. There's no such inconsistency issue with the ratios that
use the number of cookies in the denominator.
Product usage statistics like the average time the software was used in the
trial period are potentially interesting features, but aren't directly related
to our experiment. We might not have a strong feeling about what kind of effect
the homepage will have on people that actually download the software. Stated
differently, product usage isn't a direct target of the homepage manipulation.
Certainly, these statistics might help us dig deeper into the reasons for
observed effects after an experiment is complete. They might even point toward
future changes and experiments to conduct. But in terms of experiment success,
product usage shouldn't be considered an invariant or evaluation metric.